On this first assignment, applying the basic functions of the Igraph package is required. The following datasets are going to be used:
You have to complete the code chunks in this document but also analyze the results, extract insights and answer the short questions. Fill the CSV attached with your answers, sometimes just the number is enough, some others just a small sentence or paragraph. Remember to change the header with your email.
In your submission please upload both this document in HTML and the CSV with the solutions.
In this section, the goal is loading the datasets given, building the graph and analyzing basics metrics. Include the edge or node attributes you consider.
Describe the values provided by summary function on the graph object.
from igraph import *
import cairo
import pandas as pd
import plotly.express as px
actorsdf = pd.read_csv('/Users/francoandresbenvenuto/Downloads/imdb_actors_key.tsv', sep='\t',encoding= 'unicode_escape')
actorsdf.head(10)
| id | name | movies_95_04 | main_genre | genres | |
|---|---|---|---|---|---|
| 0 | 15629 | Rudder, Michael (I) | 12 | Thriller | Action:1,Comedy:1,Drama:1,Fantasy:1,Horror:1,N... |
| 1 | 5026 | Morgan, Debbi | 16 | Drama | Comedy:2,Documentary:1,Drama:6,Horror:2,NULL:3... |
| 2 | 11252 | Bellows, Gil | 33 | Drama | Comedy:6,Documentary:1,Drama:7,Family:1,Fantas... |
| 3 | 5150 | Dray, Albert | 20 | Comedy | Comedy:6,Crime:1,Documentary:1,Drama:4,NULL:5,... |
| 4 | 4057 | Daly, Shane (I) | 18 | Drama | Comedy:2,Crime:1,Drama:7,Horror:1,Music:1,Musi... |
| 5 | 12373 | Macfadyen, Angus | 24 | Drama | Action:1,Adventure:1,Documentary:1,Drama:7,Fam... |
| 6 | 3453 | Djola, Badja | 11 | Drama | Adult:1,Drama:7,Thriller:3 |
| 7 | 9878 | Twiggy | 12 | Music | Documentary:5,Drama:1,Music:4,Romance:2 |
| 8 | 4988 | Winfrey, Oprah | 21 | Music | Comedy:1,Documentary:8,Drama:1,Family:1,Music:... |
| 9 | 13032 | Champagne (II) | 11 | Adult | Adult:8,NULL:3 |
edges_actorsdf = pd.read_csv('/Users/francoandresbenvenuto/Downloads/imdb_actor_edges.tsv', sep='\t',encoding= 'unicode_escape')
edges_actorsdf.head(10)
| from | to | weight | |
|---|---|---|---|
| 0 | 17776 | 17778 | 6 |
| 1 | 5578 | 9770 | 3 |
| 2 | 5578 | 929 | 2 |
| 3 | 5578 | 9982 | 2 |
| 4 | 1835 | 6278 | 2 |
| 5 | 1835 | 1664 | 7 |
| 6 | 1835 | 1791 | 2 |
| 7 | 1835 | 6435 | 2 |
| 8 | 1835 | 10037 | 4 |
| 9 | 1835 | 10697 | 3 |
1) How many nodes are there?
print('Number of nodes:')
actorsdf['id'].nunique()
Number of nodes:
17577
2) How many edges are there?
print('Number of edges:')
len(edges_actorsdf.index)
Number of edges:
287074
Analyse the degree distribution. Compute the total degree distribution.
df_graph = Graph.DataFrame(edges_actorsdf, directed=False)
actorsdata = df_graph.get_vertex_dataframe()
actorsdata['amount_of_degrees'] = df_graph.degree(mode='all')
3) How does this distributions look like?
import plotly.express as px
fig1 = px.histogram(actorsdata, x="amount_of_degrees")
fig1.show()
4) What is the maximum degree?
max(df_graph.degree())
784
5) What is the minum degree?
min(df_graph.degree())
1
You have functions in igraph to calculate the diameter and the average path length. Think if you should consider the weights, the directions, etc.
6) What is the diameter of the graph?
df_graph.diameter()
16
7) What is the avg path length of the graph?
df_graph.average_path_length()
4.890545545798965
(Optional but recommended): Obtain the distribution of the number of movies made by an actor and the number of genres in which an actor starred in. It may be useful to analyze and discuss the results to be obtained in the following exercises.
actorsdf['amount_of_genres'] = actorsdf.genres.apply(lambda x: len(x.split(',')) )
actorsdf
| id | name | movies_95_04 | main_genre | genres | amount_of_genres | |
|---|---|---|---|---|---|---|
| 0 | 15629 | Rudder, Michael (I) | 12 | Thriller | Action:1,Comedy:1,Drama:1,Fantasy:1,Horror:1,N... | 10 |
| 1 | 5026 | Morgan, Debbi | 16 | Drama | Comedy:2,Documentary:1,Drama:6,Horror:2,NULL:3... | 6 |
| 2 | 11252 | Bellows, Gil | 33 | Drama | Comedy:6,Documentary:1,Drama:7,Family:1,Fantas... | 11 |
| 3 | 5150 | Dray, Albert | 20 | Comedy | Comedy:6,Crime:1,Documentary:1,Drama:4,NULL:5,... | 8 |
| 4 | 4057 | Daly, Shane (I) | 18 | Drama | Comedy:2,Crime:1,Drama:7,Horror:1,Music:1,Musi... | 8 |
| ... | ... | ... | ... | ... | ... | ... |
| 17572 | 16211 | Urrutia, Paulina | 10 | Romance | Comedy:1,Drama:2,NULL:4,Romance:2,Short:1 | 5 |
| 17573 | 4910 | Kay, Lisa (I) | 10 | Comedy | Comedy:5,Drama:1,Fantasy:1,NULL:2,Romance:1 | 5 |
| 17574 | 5746 | Sutherland, Kiefer | 43 | Drama | Action:2,Comedy:3,Documentary:10,Drama:7,Famil... | 13 |
| 17575 | 1645 | Glyde, Billy | 182 | Adult | Adult:139,Drama:1,NULL:38,Sci-Fi:3,Short:1 | 5 |
| 17576 | 8474 | Scott, Ridley | 24 | Family | Documentary:18,Family:2,NULL:2,Short:2 | 4 |
17577 rows × 6 columns
Obtain three vectors with the degree, betweeness and closeness for each vertex of the actors' graph.
actorsdata['betweeness'] = df_graph.betweenness()
actorsdata['closeness'] = df_graph.closeness()
actorsdata.head(10)
| name | amount_of_degrees | betweeness | closeness | |
|---|---|---|---|---|
| vertex ID | ||||
| 0 | 0 | 8 | 94.526366 | 0.168868 |
| 1 | 1 | 40 | 9914.327130 | 0.168930 |
| 2 | 2 | 29 | 134403.550820 | 0.196873 |
| 3 | 3 | 6 | 154.697492 | 0.167262 |
| 4 | 4 | 42 | 31566.548050 | 0.185074 |
| 5 | 5 | 40 | 25408.367555 | 0.181312 |
| 6 | 6 | 9 | 1181.540684 | 0.171957 |
| 7 | 7 | 7 | 13.781625 | 0.157331 |
| 8 | 8 | 25 | 1153.017469 | 0.168902 |
| 9 | 9 | 25 | 8810.661314 | 0.172383 |
Obtain the list of the 20 actors with the largest degree centrality. It can be useful to show a list with the degree, the name of the actor, the number of movies, the main genre, and the number of genres in which the actor has participated.
centrality_actors = pd.merge(actorsdf,actorsdata, left_on='id', right_on='name')
centrality_actors.sort_values('amount_of_degrees',ascending=False).head(20)
| id | name_x | movies_95_04 | main_genre | genres | amount_of_genres | name_y | amount_of_degrees | betweeness | closeness | |
|---|---|---|---|---|---|---|---|---|---|---|
| 12147 | 162 | Davis, Mark (V) | 540 | Adult | Action:1,Adult:429,Comedy:3,Crime:1,Documentar... | 10 | 162 | 784 | 9.318531e+05 | 0.249300 |
| 1761 | 1743 | Sanders, Alex (I) | 467 | Adult | Action:1,Adult:380,Adventure:1,Comedy:2,Docume... | 10 | 1743 | 610 | 5.572365e+05 | 0.245821 |
| 13442 | 1754 | North, Peter (I) | 460 | Adult | Action:1,Adult:389,Documentary:5,Drama:5,NULL:... | 8 | 1754 | 599 | 4.173385e+05 | 0.241765 |
| 11272 | 1802 | Marcus, Mr. | 435 | Adult | Adult:343,Crime:1,Documentary:2,NULL:86,Short:... | 6 | 1802 | 584 | 1.463808e+06 | 0.249964 |
| 4092 | 407 | Tedeschi, Tony | 364 | Adult | Adult:286,Adventure:1,Comedy:1,Documentary:2,D... | 11 | 407 | 561 | 6.721635e+05 | 0.245693 |
| 8354 | 164 | Dough, Jon | 300 | Adult | Adult:248,Adventure:1,Comedy:1,Documentary:1,D... | 8 | 164 | 555 | 8.636479e+05 | 0.248562 |
| 5968 | 179 | Stone, Lee (II) | 403 | Adult | Adult:310,Comedy:1,Documentary:1,Fantasy:2,NUL... | 7 | 179 | 545 | 3.393109e+05 | 0.238488 |
| 2236 | 176 | Voyeur, Vince | 370 | Adult | Action:1,Adult:303,Comedy:3,Crime:1,Documentar... | 10 | 176 | 533 | 3.810606e+05 | 0.245783 |
| 5752 | 175 | Lawrence, Joel (II) | 315 | Adult | Adult:257,Comedy:1,Documentary:1,Musical:1,NUL... | 7 | 175 | 500 | 2.851236e+05 | 0.241337 |
| 15511 | 160 | Steele, Lexington | 429 | Adult | Adult:340,Comedy:1,Documentary:4,Drama:1,Fanta... | 8 | 160 | 493 | 2.971735e+05 | 0.240841 |
| 16429 | 127 | Ashley, Jay | 309 | Adult | Adult:247,Adventure:1,Comedy:1,Documentary:2,D... | 8 | 127 | 490 | 1.794342e+05 | 0.238661 |
| 17062 | 1626 | Boy, T.T. | 336 | Adult | Action:1,Adult:271,Documentary:5,Drama:4,NULL:... | 8 | 1626 | 475 | 2.177450e+05 | 0.240556 |
| 10548 | 2108 | Jeremy, Ron | 280 | Adult | Adult:149,Adventure:1,Animation:1,Comedy:15,Do... | 14 | 2108 | 471 | 9.748544e+06 | 0.282720 |
| 8843 | 131 | Cannon, Chris (III) | 287 | Adult | Action:1,Adult:226,Comedy:2,Documentary:4,Dram... | 9 | 131 | 471 | 5.315881e+05 | 0.247557 |
| 6840 | 163 | Bune, Tyce | 267 | Adult | Adult:213,Comedy:3,Fantasy:2,NULL:48,Short:1 | 5 | 163 | 463 | 3.241937e+05 | 0.236232 |
| 4524 | 701 | Hanks, Tom | 75 | Family | Animation:3,Comedy:5,Documentary:32,Drama:6,Fa... | 13 | 701 | 457 | 1.977252e+06 | 0.305231 |
| 6587 | 1778 | Michaels, Sean | 252 | Adult | Adult:203,Documentary:4,Drama:1,Fantasy:1,NULL... | 6 | 1778 | 451 | 3.383991e+05 | 0.243856 |
| 1124 | 177 | Stone, Kyle | 278 | Adult | Adult:212,Comedy:3,Documentary:2,Fantasy:4,NUL... | 8 | 177 | 450 | 1.032512e+05 | 0.235648 |
| 537 | 1688 | Hardman, Dave | 319 | Adult | Adult:240,Adventure:1,Comedy:2,Drama:3,NULL:67... | 7 | 1688 | 438 | 2.977737e+05 | 0.240841 |
| 11387 | 1804 | Surewood, Brian | 244 | Adult | Adult:190,Comedy:2,Fantasy:1,Horror:1,NULL:49,... | 6 | 1804 | 428 | 1.729722e+05 | 0.237955 |
8) Who is the actor with highest degree centrality?
The actor with highest degree centrality: Mark Davis
9) How do you explain the high degree of the top-20 list??
The degree centrality has a clear increase since a lot of actors participate in several moviees. We can find an outliar with Tom Hanks who has a high amount of degrees due to his high level of popularity and is not from the adult movie industry. Adult movies actors are able to act in more movies than hollywood actors since they are shorter and cheaper.
Obtain the list of the 20 actors with the largest betweenness centrality. Show a list with the betweenness, the name of the actor, the number of movies, the main genre, and the number of genres in which the actor has participated.
centrality_actors.sort_values('betweeness',ascending=False).head(20)
| id | name_x | movies_95_04 | main_genre | genres | amount_of_genres | name_y | amount_of_degrees | betweeness | closeness | |
|---|---|---|---|---|---|---|---|---|---|---|
| 10548 | 2108 | Jeremy, Ron | 280 | Adult | Adult:149,Adventure:1,Animation:1,Comedy:15,Do... | 14 | 2108 | 471 | 9.748544e+06 | 0.282720 |
| 4693 | 3284 | Chan, Jackie (I) | 59 | Comedy | Action:2,Comedy:13,Crime:4,Documentary:18,Fami... | 12 | 3284 | 135 | 4.716909e+06 | 0.287238 |
| 2563 | 564 | Cruz, Penélope | 46 | Drama | Adventure:1,Comedy:2,Documentary:5,Drama:6,Fam... | 13 | 564 | 182 | 4.330663e+06 | 0.295555 |
| 14433 | 14458 | Shahlavi, Darren | 16 | Action | Action:4,Comedy:3,Documentary:1,Drama:1,Fantas... | 9 | 14458 | 8 | 4.295503e+06 | 0.193886 |
| 15720 | 17308 | Del Rosario, Monsour | 20 | Action | Action:8,Drama:3,Fantasy:1,Horror:2,NULL:1,Rom... | 9 | 17308 | 6 | 4.267099e+06 | 0.163154 |
| 17458 | 285 | Depardieu, Gérard | 56 | Comedy | Adventure:1,Comedy:15,Crime:2,Documentary:11,D... | 11 | 285 | 159 | 4.037356e+06 | 0.278351 |
| 8799 | 13723 | Bachchan, Amitabh | 35 | Romance | Action:1,Comedy:1,Crime:1,Documentary:1,Drama:... | 13 | 13723 | 66 | 2.570247e+06 | 0.226349 |
| 10412 | 1529 | Jackson, Samuel L. | 97 | Drama | Action:3,Adventure:1,Comedy:3,Crime:3,Document... | 14 | 1529 | 427 | 2.539614e+06 | 0.309265 |
| 5517 | 5083 | Soualem, Zinedine | 65 | Comedy | Animation:1,Comedy:17,Crime:3,Documentary:1,Dr... | 12 | 5083 | 121 | 2.368164e+06 | 0.249825 |
| 15894 | 1923 | Del Rio, Olivia | 84 | Adult | Adult:64,Drama:1,Fantasy:2,NULL:14,Sci-Fi:1,Sh... | 6 | 1923 | 168 | 2.316388e+06 | 0.240033 |
| 559 | 2549 | Jaenicke, Hannes | 66 | Thriller | Action:2,Adventure:2,Comedy:3,Crime:10,Drama:1... | 12 | 2549 | 73 | 2.136980e+06 | 0.250736 |
| 15834 | 2623 | Hayek, Salma | 44 | Drama | Adventure:1,Animation:1,Comedy:3,Crime:2,Docum... | 14 | 2623 | 185 | 2.117390e+06 | 0.288701 |
| 568 | 13698 | Pelé | 10 | Romance | Comedy:1,Documentary:5,NULL:3,Romance:1 | 4 | 13698 | 7 | 2.098485e+06 | 0.192619 |
| 4157 | 4628 | Knaup, Herbert | 50 | Drama | Action:1,Adult:1,Comedy:4,Crime:5,Drama:14,NUL... | 11 | 4628 | 63 | 2.062585e+06 | 0.231765 |
| 9655 | 3213 | Goldberg, Whoopi | 109 | Comedy | Adventure:3,Animation:1,Comedy:16,Documentary:... | 14 | 3213 | 398 | 2.051621e+06 | 0.307760 |
| 13349 | 4550 | Roth, Cecilia | 23 | Drama | Comedy:1,Documentary:2,Drama:11,Family:3,Horro... | 8 | 4550 | 62 | 2.019247e+06 | 0.244677 |
| 246 | 3075 | Bellucci, Monica | 43 | Drama | Adventure:1,Comedy:4,Documentary:7,Drama:8,Fam... | 12 | 3075 | 81 | 2.006221e+06 | 0.273986 |
| 4524 | 701 | Hanks, Tom | 75 | Family | Animation:3,Comedy:5,Documentary:32,Drama:6,Fa... | 13 | 701 | 457 | 1.977252e+06 | 0.305231 |
| 5906 | 1057 | August, Pernilla | 31 | Drama | Action:2,Documentary:4,Drama:14,Family:1,Music... | 10 | 1057 | 44 | 1.937362e+06 | 0.256443 |
| 1531 | 5295 | Kier, Udo | 69 | Drama | Action:2,Animation:2,Comedy:10,Crime:2,Documen... | 15 | 5295 | 64 | 1.919261e+06 | 0.281793 |
10) Who is the actor with highest betweenes?
The actor with the highest betweenes: Ron Jeremy
11) How do you explain the high betweenness of the top-20 list?
We can see a correlation betweeen famous actors such as Samuel L. Jackson and Salma Hayek who have high levels of betweeness due to their celebrity profiles and hollywood stars.
Obtain the list of the 20 actors with the largest closeness centrality. Show a list with the closeness the name of the actor, the number of movies, the main genre, and the number of genres in which the actor has participated.
centrality_actors[centrality_actors['closeness'] < 1].sort_values('closeness',ascending=False).head(20)
| id | name_x | movies_95_04 | main_genre | genres | amount_of_genres | name_y | amount_of_degrees | betweeness | closeness | |
|---|---|---|---|---|---|---|---|---|---|---|
| 2109 | 16747 | Armanis, Julian | 12 | Adult | Adult:11,Documentary:1 | 2 | 16747 | 6 | 24.000000 | 0.714286 |
| 14828 | 13582 | Fazira, Erra | 13 | Romance | Animation:1,Crime:1,NULL:2,Romance:9 | 4 | 13582 | 1 | 0.000000 | 0.666667 |
| 13001 | 16913 | Lim, Kay Tong | 11 | Drama | Comedy:3,Drama:3,Romance:2,Short:1,Thriller:1,... | 6 | 16913 | 1 | 0.000000 | 0.666667 |
| 6367 | 17822 | Lee, Mark (X) | 10 | Comedy | Comedy:4,Crime:2,Drama:1,Family:1,NULL:1,Roman... | 6 | 17822 | 1 | 0.000000 | 0.666667 |
| 17467 | 13581 | Hassan, Jalaluddin | 14 | Romance | Comedy:1,Drama:5,NULL:1,Romance:6,Sci-Fi:1 | 5 | 13581 | 1 | 0.000000 | 0.666667 |
| 2567 | 17804 | Kovac, Erik | 11 | Adult | Adult:8,Documentary:1,NULL:2 | 3 | 17804 | 6 | 1.000000 | 0.588235 |
| 9514 | 17803 | Sulik, Dano | 21 | Adult | Adult:17,Documentary:1,NULL:2,Romance:1 | 4 | 17803 | 6 | 1.000000 | 0.588235 |
| 7288 | 16740 | Novotny, Pavel | 15 | Adult | Adult:15 | 1 | 16740 | 4 | 21.000000 | 0.588235 |
| 6659 | 16745 | Bonnet, Sebastian | 17 | Adult | Adult:14,Documentary:1,NULL:1,Romance:1 | 4 | 16745 | 6 | 1.000000 | 0.588235 |
| 5377 | 16748 | Davidov, Ion | 10 | Adult | Adult:7,Documentary:1,NULL:2 | 3 | 16748 | 6 | 1.000000 | 0.588235 |
| 12756 | 13064 | Ridgeston, Lukas | 10 | Adult | Adult:6,Documentary:2,NULL:2 | 3 | 13064 | 6 | 1.000000 | 0.588235 |
| 11593 | 13289 | Ahn, Sung-kee | 22 | Romance | Action:2,Comedy:3,Drama:3,Fantasy:1,Mystery:1,... | 9 | 13289 | 4 | 108.166667 | 0.458333 |
| 2320 | 16746 | Paulik, Johan | 14 | Adult | Adult:5,Documentary:2,NULL:6,Romance:1 | 4 | 16746 | 5 | 0.000000 | 0.454545 |
| 17543 | 13293 | Lee, Beom-su | 19 | Romance | Action:1,Comedy:4,Drama:4,Family:1,Horror:1,NU... | 8 | 13293 | 7 | 115.666667 | 0.448980 |
| 85 | 17022 | Klauzner, Uri | 15 | Drama | Action:1,Crime:1,Drama:9,Short:1,War:3 | 5 | 17022 | 8 | 109.700000 | 0.428571 |
| 2632 | 17671 | Pyrpassopoulos, Giorgos | 11 | Comedy | Comedy:4,Drama:2,NULL:2,Romance:2,Short:1 | 5 | 17671 | 7 | 96.183333 | 0.421053 |
| 17103 | 16741 | Hanak, Ales (I) | 15 | Adult | Adult:15 | 1 | 16741 | 3 | 0.000000 | 0.416667 |
| 13875 | 16742 | Korsakov, Pavel | 10 | Adult | Adult:10 | 1 | 16742 | 3 | 0.000000 | 0.416667 |
| 16262 | 16744 | Kajc, Cage | 11 | Adult | Adult:10,NULL:1 | 2 | 16744 | 3 | 0.000000 | 0.416667 |
| 12734 | 17846 | Kafetzopoulos, Antonis | 13 | Comedy | Comedy:6,Drama:2,NULL:1,Romance:4 | 4 | 17846 | 6 | 145.450000 | 0.405063 |
12) Who is the actor with highest closeness centrality?
The actor with the highest closeness centrality is Julian Armanis with 0.714286
13) How do you explain the high closeness of the top-20 list?
We can appreciate how the top 20 actors have strong interactions due to their short average distance to others. Technically, it's shorter for nodes to reach these top 20 actors.
Explore the Erdös-Renyi model and compare its structural properties to those of real-world networks (actors):
Use any community detection algorithm for the actors' network and discuss whether the communities found make sense according to the vertex labels.